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Abstract. The main topic of this contribution is the problem of count- 
ing square- free numbers not exceeding n. Before this work we were able to 
iy-\ ■ do it in timqj 0{y/n). Here, the algorithm with time complexity 0(n 2 ' 5 ) 

£\\ ' and with memory complexity (^(n 1 ' 5 ) is presented. Additionally, a par- 

allel version is shown, which achieves full scalability. 
As of now the highest computed value was for n = 10 lr . Using our 
implementation we were able to calculate the value for n = 10 36 on a 
cluster. 
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1 Introduction 



>. 

A square-free number is an integer which is not divisible by a square of any 

integer greater than one. Let S(n) denote the number of square-free positive 
integers less or equal to n. We can approximate the value of S(n) using the 



asymptotic equation: 

l> ' 6 r- 

O" S(n) = -^n + 0(Vn). 

Under the assumption of the Riemann hypothesis the error term can be further 
reduced [3]: 

S(n) = -^n + 0(n 17 / 54+ "). 

b ! 

Although these asymptotic equations allow us to compute S(n) with high accu- 
racy, they do not help to compute the exact value. 

The basic observation for efficient algorithms is the following formula. 

Theorem 1. 

Lv^J 

S(n)=J2Kd)[^\ , (1) 

d=l 

where n(d) is the Mobius function. 



1 Comparing to the Big-0 notation, Soft-O (O) ignores logarithmic factors. 



The simple proof of this theorem using the inclusion-exclusion principle is pre- 
sented in App. [5] The same proof can be found in [3]. It allows the author to 
develop an 0(yfn) algorithm and to compute S^IO 17 ). In Sect. [5] we show de- 
tails of this algorithm together with the reduction of the memory complexity to 

To construct a faster algorithm we have to play with the summation (TTJ) . In 
Sect. l3.il the new formula is derived and stated in Theorem[2] Using this theorem 
we are able to construct the algorithm working in time 0(n 2 ' 5 ). It is described 
in the rest of Sect. [3] However, to achieve a memory efficient procedure more 
research is required. The memory reduction problem is discussed in Sect.|4j where 
the modifications leading to the memory complexity 0{n}' 5 ) are presented. The 
result is put into Algorithm [4] 

Applying Algorithm [4] for huge n leads to computing time measured in years. 
Therefore, a practical algorithm should be distributed. Section [5] addresses the 
parallelization problem. At first sight it looks that Algorithm |4] can be easily 
distributed, but a deeper analysis uncovers new problems. We present a solution 
for these problems, and get a fully scalable method. As a practical evidence, we 
computed S'(10 e ) for all integers e < 36, whereas before, the largest known value 
of S(n) was for n = 10 17 |4I6| . For instance, the value 5'(10 36 ) was computed in 
88 hours using 256 processors. The detailed computation results are attached in 
Sect. E 

2 The 0(-\/n) algorithm 

We simply use Theorem [1] to compute S(n). In order to compute summation 
(fT]) we need to find the values of fj,(d) for d = 1,...,K, where K — [y/nj. 
This can be done in time 0(K log log K) and in memory 0(\/~K) using a sieve 
similar to the sieve of Eratosthenes [2]. See App. [B] for a detailed descrip- 
tion. This sieving algorithm tabulates values in blocks of size B = \^/~K\ ■ 
We assume we have the function TabulateMobiusBlock such that the call 
TABULATEMOBiusBLOCK(a, b) outputs the array mu containing the values of 
the Mobius function: /i(fc) = mu[k] for each k G (a, b]. This function works in 
time 0(&loglog&) and in memory O(max(vo, b — a)). Now, to calculate S(n), 
we split the interval [1, K] into 0(y/K) blocks of size B. It is presented in Algo- 
rithm [T] 

Summarizing, the basic algorithm has 0{yfn) time complexity and O(tfn) 
memory complexity. 



3 The New Algorithm 

The key point of discovering a faster algorithm is a derivation of a new formula 
from (fTJ) in Sect. 13.11 The new formula depends on the Mertens function (the 
Mobius summation function). Section 13.21 explains how one may compute the 
needed values. Section 1331 states the algorithm. In Sect. 13.41 the optimal values 



Algorithm 1 Calculating S(n) in time 0(y/n) and in memory O(tfn) 



s<-O,&<-O,.K"<-0(|ynJ),-B<- [VK] 
repeat 

a «- b, b 4- min(b + B, K) 

TABULATEMOBIUSBLOCK(a, b) 

for k — a + 1, . . . , b do 

$ 4- s + mu[k] ■ — 

end for 
until a > K 
return s 



of the algorithm parameters are estimated, and the resulting time complexity of 
<3(n 2 / 5 ) is derived. 



3.1 Establishing the New Formula 

To alter ([T]) we break the sum. We split the summation range [1, [_V^J] m t° two 
smaller intervals [1,-D] and (D, ly/n\}: 



S(n) = Si(n) + S 2 (n) 



where 



S 1 (n) = J2 M d ) 

l<d<D 

S 2 (n) = J2 A*(«0[£J 



rf2 



d>D 



We introduced a new variable 13. Optimal value of this variable will be deter- 
mined later. Sum S2(n) can be rewritten using Iverson's conventional: 



M = £m(4^J = ££[* 



The predicate in brackets transforms as follows: 



n 
rf2 



i/j,(d) 



(2) 



n 

rf2 



i < -77 < i + 1 



t + 1 



< d< 



To shorten the notation we introduce a new variable / and a new sequence xc 

for % = !,...,! . (3) 



[P] 



1 if P is true 
otherwise . 



The sequence Xi should be strictly decreasing. To ensure this, it is enough to set 
/ such that 



n 

> 1 



I -I V I 



'n> 



v^ > ViVT~ 7 T{v r [ + s/T^i) 

n>I(I-l)(Vl+y/l~T) 2 . (4) 

Because 

1(1 - 1)(VT + \/l~T) 2 <II- (2\/J) 2 = 4/ 3 , 

to satisfy Q, it is enough to set 

1 ~ \jl ■ (5) 

Suppose we set / satisfying (JS|). Now, we take D = xj and we use Xi notation 
© in ©: 

S 2 (n) = Y^ X^[' Tl+1 <d - x i\ i l x ( d ) = Yl i Y V{d) . (6) 

d>xj i l<z<7 Xi+i<d<Xi 

Finally, it is convenient to use the Mertens function: 

IxJ 

M(x)= J2 ^)=£m(») , (7) 

l<i<x i— 1 

thus we simplify ((6|) to: 

S 2 (n) = Y, *(**&) - M(x i+1 )) = (Y1 M ^) -V- 1 ) M ( x i) ■ ( 8 ) 

Theorem [2] summarizes the above analysis. 

[n~ 
Theorem 2. Let I be a positive integer satisfying I < \ —■ Let Xi - 

for i = 1, . . . , J and D = xi . Then S(n) = Si(n) + S^w), where 

D 

&(*) = £/*(«*) Ms J ' 

d=i 

S ^ n ) = fe M te)) -{I-1)M{X!) . 



3.2 Computing Values of the Mertens Function 

By applying the Mobius inversion formula to ([7]) we can get a nice recursion for 
the Mertens function: 



M(x) 



1 



d>2 



M 



© 



(9) 



Here, an important observation is that having all values M(x/d) for d > 2, we 
are able to calculate M(x) in time 0(y/x). This is because there are at most 
2^/x different integers of the form \x/d\ , since x/d < \fx for d > \fx. 

3.3 The Algorithm 

The simple algorithm exploiting the above ideas is presented in Algorithm [5] 



Algorithm 2 Efficient counting square-free numbers 



compute Si (n) and Mid) for d — 1, . . . , D 
for i = / — 1, . . . , 1 do 

compute M(xi) by $9$ 
end for 

compute S'2(n) by © 
return Si(n) + S2 (n) 



To compute M(Xi) (line[3]) we need the values M(xi/d) for d > 2. If Xj/d < £) 
then M(xi/d) was determined during the computation of S'i(n). If Xi/d > D then 
see that 



-d. 


= 


LVfJ 


= 


vT 


= 


/ n 


d 


d 



•^d 2 i 



(10) 



thus M(xi/d) = M(xj) for j = <i 2 i. Of course j < I, because otherwise \Jnj j < 
D. Observe that it is important to compute M(xi) in a decreasing order (lined]). 



3.4 The Complexity 

Let us estimate the time complexity of Algorithm^ Computing S\{n) has com- 
plexity O(DloglogD). 

Computing M(xi) takes 0{^fxl) time. The entire for loop (line[2H4|) has the 
time complexity: 



±o^-±o(f£) 



O tfn 



u 



(11) 



Using the asymptotic equality 



I— 1 v 



(fTTj) rewrites to: 

0(n^ 4 I 3 / 4 ) . 

The computation of 5*2 (n) is dominated by the for loop. Summarizing the 
time complexity of Algorithm [5] is 

0(.D log log L> + n 1/4 / 3/4 ) . (12) 

We have to tune the selection of / and D to minimize the expression (fT2"]) . The 
larger / we take the smaller D will be, thus the parameters / and D are optimal 
when 

0(D log log D) = 0(n 1/4 I 3/4 ) . 

This takes place for 

/ = n 1/5 (loglogn) 4/5 , 

and then 

0{D log log-D) = 0(n 1/4 I 3/4 ) = (9(n 2/5 (loglogn) 3/5 ) . 



Theorem 3. The time complexity of Algorithm^ is 0(n 2 ' 5 (log log n) 3 ' 5 ) = 
0(n 2 / 5 ) for I = n 1 / 5 (log log n) 4 / 5 = 0{n 1 ^). 

The bad news are the memory requirements. To compute M(xi) values we 
need to remember M{d) for all d = l,...,D, thus we need O(D) = 0(n 2 ^ 5 ) 
memory. This is even greater memory usage than in the basic algorithm. In the 
next section we show how to overcome this problem. 

4 Reducing Memory 

To reduce memory we have to process values of the Mdbius function in blocks. 
This affects the computation of needed Mertens function values which were pre- 
viously computed by the recursion ([9]) as described in Sect. 14.11 These values 
have to be computed in a more organized manner. Section 14.21 provides neces- 
sary utilities for that. Moreover in Sect. l4.3l some data structures are introduced 
in order to achieve a satisfying time complexity. Finally, Sect. 14.41 states the 
algorithm together with a short complexity analysis. 

4.1 Splitting into Blocks 

We again apply the idea of splitting computations into smaller blocks. To com- 
pute Si(n) we need to determine fj,(d) and M(d) for d = 1,...,D. We do it 
in blocks of size B = 0{\fD) by calling procedure TabulateMobiusBlock. 
That way we are able to compute S\(n), but to compute S2 (n) we face to the 
following problem. 

We need to compute M(xi) for integer i £ [1,/)- Previously, we memorized 
all needed M(l), . . . , M(D) values and used recursion ((9J. Now, we do not have 



unrestricted access to values of the Mertens function. After processing a block 
(a, b] we have only access to values M(k) for k <G (a,b]. We have to utilize these 
values before we switch to the next block. If a value M{k) occurs on the right 
hand side of the recursion @ for x — x, for some ie [1, 7), then we should make 
an update. 

The algorithm should look as follows. We start the algorithm by creating an 
array Mx: 

Mx\i] 4- 1 for i = 1, . . . , I - 1 . 

During the computation of Si(n) we determine M{k) for some k. Then, for every 
i G [1, 1) such that M(k) occurs in the sum 



£M(|) , (13) 



(14) 



d>2 

i.e. for every i E [1,1) such that there exists an integer d > 2 such that 

k 

we estimate the number of occurrences m of M(k) in (| 13[) and update 

Mx[i] 4- Mx[i] - m ■ M(k) . (15) 

After processing all k = 1, . . . , D, there remains to update Mx[i] by M(xi/d) for 
all [xi/d\ > D. With the help of equality (fT0|) it is enough to update Mx[i] by 
M(x ( j2 i ) for all d 2 i < I. After these updates we will have Mx[i] — M(xi). 

4.2 Dealing with Mx Array Updates 

The problem is how to, for given k, quickly find all possible values of i, that 
there exists an integer d > 2 fulfilling (|T4|) . There is no simple way to do it in 
expected constant time. Instead, for given i we can easily calculate successive k. 



Lemma 1. Suppose that for a given integer i £ [1,1) and an integer k there 
exists an integer d satisfying (|14p . Let us denote 



d a = 
d b = 



k + 1 



then 



(i) the number of occurrences m, needed for update (J15I) . equals d a — db, 
(ii) the next integer k satisfying (|14j) is for d — d , and it is equal to |_Xi/c4J . 



Proof. All possible integers d satisfying (fT4")) are: 

so (fT4|) is satisfied for d £ (db, d a ], and the next k satisfying (fl4| is for d = <if,. D 

Lemma [TJ for every i, allows us to walk through successive values of k, for which 
we have to update Mx[i], Since the target is to reduce the memory usage, we 
need to group all updates into blocks. Algorithm [3] shows how to utilize Lemma[T] 
in order to update Mx[i] for the entire block (a, &]. 



Algorithm 3 Updating Mx[i] for a block (a, b] 

Require: bounds < a < b, index i G [1,/), the smallest k 6 (a, b] that there exists 

d satisfying (Ti4l) 
Ensure: Mx[i] is updated by all M(k) for k £ (o, 6], the smallest k > b for the next 

update is returned 
1: function MxBLOCKUPDATE(a, 6, i, k) 

repeat 

Mx[i] <r- Mx[i\ - (d a - d b ) ■ M{k) 

\-d B \ 
d a <— db 
until k > b 
return k 
end function 



10 



4.3 Introducing Additional Structures 



Let B = [vD\ be the block size, and L = \D/B~\ be the number of blocks. 
We process k values in blocks (a , ai], (oi, 02], ■ ■ • , (az,-i, ol], where a/ = _BZ for 
< I < L and ar, = D. We need additional structures to keep track for every 
i G [1,-0 where is the next update: 

— mink[i] stores the next smallest k for which Mx[i] has to be updated, 

— ilist[l] is a list of indexes i for which the next update will be for k belonging 
to the block (ai,ai+i\. 

Using these structures we are able to perform every update in constant time. 
Once we update Mx[i] for all necessary k € (a;,a/+i] by MxBlockUpdate, we 
get next k > ai +1 for which the next update should be done. We can easily 
calculate the block index I' for this k and schedule it by putting i into ilist[l']. 



4.4 The Algorithm 

The result of the entire above discussion is presented in Algorithm @] We man- 
aged to preserve the number of operations, therefore the time complexity re- 
mained 0(n 2 / 5 ). Each of the additional structures has I = Ofji 1 ^) or L = 
0(VD) = Oin 1 / 5 ) elements. Blocks have size 0{B\ = 0{V~D) = din 1 / 5 ), 
Therefore, the memory complexity of Algorithm [4] is 0{n}/ 5 ). 



Algorithm 4 Calculating S(n) in time 0(n 2 ' 5 ) and in memory (^(n 1 ' 5 ) 



2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16: 

17 
18 

19 

20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 



B «- [V~D\ , L «- \D/B\ 



I «- e(n 1/5 (loglogn) 4/5 ),D ^ 

for I = 0, . . . , L - 1 do 

ilist[l] <- 
end for 
for j = 0, . . . , I — 1 do 

Mx[i] «- 1 

mink[i] 4— 1 

ifo«[0] <- ifoi[0] U {i} 
end for 
si ^0 
for I = 0, . . . ,L — 1 do > blocks processing loop 

TABULATEMOBIUSBLOCK(a;,a! + i) 

for fee (a;, . . . ,ai+i] do 

si «- Si + mu[k] ■ — 
end for 

compute M(k) for fc £ (oj, Oi+i] from values mii[fc] and M{a{) 
for each i £ «Mst[Z] do 

mink[i] <— MxBLOCKUPDATE(a!,ai+i,i, miniji]) 

I 777 1 T) AT 7 I 

?' <— — — > next block where Mx[i] has to be updated 

if I' < L and mvnk[i] < Xi then 

ilist[l'\ <- ili$t[l'} U {i} 
end if 
end for 
ilist[l] <r- 
end for 

for i — I — 1, . . . , 1 do > updating Mx[i] by M(fc) for k > D 

for all d> 2 such that d 2 i < / do 

Mz[i] <- Mx[i] - Mx[d 2 i] 
end for 
end for 

compute S2 — S2 (n) by (JHJ 
return si + S2 



/' 



Observe that most work is done in the blocks processing loop (lines ITTH25|) , 
because every other part of the algorithm takes at most Oin 1 / 5 ) operations. Ini- 



tialization of structures (lines [TUTU]) is proportional to their size din 1 / 5 ). Com- 
puting S 2 (ri) by (JSJ (line EH) takes 0(1) = 0(r7, 1/5 ). Only the time complexity 
of the part responsible for updating Mx[i] by M{k) for k > D (lines I!?61-I5U|) is 
unclear. The total number of updates in this part is: 



i=l 2<d i=\ 

d 2 i<I 



l Vi 



thus it is 0(7) = Oin 1 / 5 ). 



5 Parallelization 

As noted in Sect. 14.41 the most time consuming part of Algorithm [4] is the 
blocks processing loop. The basic idea is to distribute calculations made by this 
loop between P processors. We split the interval [1,-D] into a list of P smaller 
intervals: (oo, a\], (oi, a 2 ], . . . , (ap_i, ap], where = a < a 1 < ■ ■ ■ < ap = D. 
Processor number p, < p < P, focus only on the interval (a p , a p +i], and it is 
responsible for 

(i) calculating part of the sum Si (n) 

E ^ fc ) ■ LtJJ ■ ( 17 ) 

k€(a p ,a p+1 ] 

(ii) making updates of the array Mx[l, ...,/— 1] for all k € (a p , a p +i]- 

All processors share Si value and Mx array. The only changes are additions of 
an integer, and it is required that these changes are atomic. Alternatively, a 
processor can collect all changes in its own memory, and, in the end, it only once 
change the value si and each entry of Mx array. 

Although the above approach is extremely simple, there are two drawbacks. 
First, for updates (|n|), a processor needs to calculate successive values of the 
Mertens function: M(a p + 1), . . . , M(a p+ \). Computation of (fT7| produce suc- 
cessive values of the Mobius function starting from fi(a p + 1), therefore the 
Mertens function values can be also computed if only we knew the value of 
M(a p ). Unfortunately, there is no other way than computing it from scratch. 
However, to compute M(x) there is an algorithm working in time 0(x 2 ' 3 ) and 
memory O^x 1 ' 3 ). See for instance [2], or [5] for a simpler algorithm missing a 
memory reduction. 

In our application we have x < D — 0(n 2 ' 5 ), therefore cumulative additional 
time we spend in computing Mertens function values from scratch is 0(PD 2 / 3 ) = 
0(Pn 4 / 15 ). We want this does not exceed the targeted time of (5(n 2 / 5 ), therefore 
the number of processors is limited by: 

P = 0(^)=0(n^). (18) 



Second drawback comes from an observation that the number of updates of 
Mx array is not uniformly distributed on k <E [1 , D) . For example for k < \f~D = 
(^(n 1 / 5 ) for every i <G [1,-0 there always exists d>2 such that (|T3]) is satisfied, 
therefore for every such k there will be I— 1 = (^(n 1 / 5 ) updates. It means that in 
a very small block (1, [V^DJ] there will be 0(n 2 / 5 ) updates, which is proportional 
to the total number of updates. We see that splitting into blocks is non-trivial 
and we need better tools for measuring work in the blocks processing loop. 

Let t s be the average time of computing a single summand of the sum S\ (n) , 
and let t u be the average time of a single update of Mx array entry. Consider a 
block (0, a]. Denote as U(a) the number of updates which must be done in this 
block. Then the expected time of processing this block is 

T(a) = t s a + t u U(a) . (19) 

It shows up that U(a) can be very accurately approximated by a closed formula: 

'la for a< yf 



U ( a ) ~ \l n o n'/ 2 / 1 / 2 , 8„l/4 /3 /4 for 4/7T < ft < £, = /n ( 20 ) 



^ 3 ~a? ^ a '3 x Awi V / 

See App. [C]for the estimation. 

The work measuring function (|T9|) says that the amount of work for the block 
(a p , a p+ i] is T(a p+ i) — T(a p ). Using this we are able to distribute blocks between 
processors in a such way, that the work is assigned evenly. 

6 Results 

We calculated 5'(10 e ) for all integer < e < 36. In App. [Dl the computed values 
are listed. First, for e < 26 we prepared the results using Algorithm[TJ the simpler 
and slower algorithm. Then we applied Algorithm [4] on a single thread. Thus we 
verified its correctness for e < 26 and we prepared further values for e < 31. 

Finally, we used parallel implementation for 24 < e < 36. The computations 
were performed in ICM UW under grant G43-5 on the cluster Halo2. See [T] 
for a specification. The results for e < 31 agreed with the previously prepared 
results. The timings of these computations are presented in Table [TJ 

Computation time is calendar time in seconds of cluster occupation. Ideal 
time represents how long computations could take, if communication between 
processors was ignored and if the work was distributed equally. This was cal- 
culated by taking cumulative time of the actual work done for each processor 
and dividing by the number of processors. We see that ideal time is close to 
computation time showing an experimental evidence of scalability of the parallel 
algorithm. 
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A Proof of Theorem Q] 

Let our universe be all positive integers less or equal to n: 

U = {l,...,n}. (21) 

For a prime integer p, let us define a set A p : 

A p = {a£ U :p 2 divides a} . (22) 

Complement of set A p represents a set of all integers less or equal to n not 
divisible by p 2 . We want to count integers not divisible by any prime square, 
therefore the number we are searching for is the size of the set f] pr ime^P- ^v 



the inclusion-exclusion principle we have: 

fl a~ p =\u\- y, \ a p\+ E \ A P nA ? 



p<q 
p.q prime 



- J2 \A P nA q nA r \ + ... 

p<q<r 
p,q,r prime 

oo 

=£(-i) 4 E \A P1 n---nA Pi 



(23) 



;=o 



pi<-<Pi 

pi,...,Pi prime 



Now, observe that 



2 2 

Pi' ■•■■Pi 



\A Pl n---nA Pi \ = 

Using the Iverson bracket we can write (f23|) as 
d2S| =YJ2^ d = Pl ■ ■■■■Pi / \Pi < ■■■ < Pi A Pi,---, Pi primc](-l) 1 -^J 



d i 



The expression [d is a product of i distinct primes] (— l) 1 means the Mobius func- 
tion n(d) in other words, therefore we get the final formula of Theorem Q] 

B Computing the Mobius Function 

To compute values of the Mobius function we exploit the following property: 



M*) = 



if p 2 divides k , 

(-l) e if k =pi ■ ... ■ p e 



(24) 



Using a sieve we can find values of n(k) for all k = 1, . . . ,K simultaneously, 
where K = y/n, as presented in Algorithm [5] To generate all primes less or 
equal to K we can use the sieve of Eratosthenes. The memory complexity is 
0{K) and the time complexity is 0(K log log K). 

The above method could be improved to fit 0(y/~K) memory by tabulating 
in blocks. We split the array mu to blocks of the size B = Q(\fK), and for each 
block we tabulate /i(-) separately using Algorithm |6] For each block we use only 
primes less or equal to yK, and we need only 0(y/~K) memory. There is at most 
K/B = 0(\f~K) blocks. Therefore, for each block the number of operations is 

0(Vk)+ Y ( 1 + — + -) =0(y/K+B\og\ogK) = 0(VK loglogK) , (25) 
p <Vk 



which results in 0(K \og\ogK) time complexity for the whole algorithm. 



Algorithm 5 Computing values of the Mobius function: the basic approach 

Require: bound 1 < K 

Ensure: fj,(k) = mu[k] for k — 1, . . . , K 

1: procedure TabulateMobius(A') 

2: for k = 1,..., K do 

3: mu[k] ■£- 1 

4: end for 

5: for each prime p < K do 

6: for each k £ [1,K] divisible by p 2 do 

7: mu[k] <- 

8: end for 

9: for each k £ [1, K] divisible by p do 
10: mu[k] < mu[fc] 



11 
12 
13 



end for 
end for 
end procedure 



C Estimating U(a) 

Instead of computing the exact number of updates in a block (0, a], we will 
compute an approximation of the expected number of updates as follows. Let us 
fix k £ (0, a] and x = Xi for some i £ [1,1). The probability that there exists d, 
such that (fl"4]) is satisfied, equals 

If" ITT forfc>Vi. 

Let us define U(a,Xi) as the expected number of updates of entry Mx[i] for all 
k £ (0, a]. Let x = Xi. If a < \fx then 



a 



U{a,x) = J2 p ( k >x) = X> = a > 

fe=i fc=i 

and if a > y/x then 

u M - T. i + £ (l-^-LVij + ^-f)^^ ^ 

thus U(a,x) can be presented as the formula: 

rw n I a for a < Vx , ,„„ N 

[/a,z =<L x ~ v (26) 

I 2^/x— | for a > v^ . 

Now we are ready to compute U(a): 

U(a)= J2 u (a^i) ■ ( 27 ) 



Algorithm 6 Computing values of the Mobius function: memory efficient sieving 

in blocks 

Require: bounds < a < b 
Ensure: fi{k) = mu[k] for each k G (a, b] 
1: procedure TABULATEMOBlusBLOCK(a, b) 
2: for each k G (a, b] do 

3: mu[k] <— 1 

4: m[k] <— 1 > multiplicity of all found prime divisors of k 

5: end for 

6: for each prime p < y/b do 
7: for each k G (a, b] divisible by p 2 do 

mw[fc] <— 
end for 
for each k £ (a, b] divisible by p do 

mu[k] < mu[k] 

m[k] 4— m[k] ■ p 
end for 
end for 
for each fc G (a,b] do 

if m[k] < k then > k — m[k] ■ q, where q is prime and q > yb 



9 

10 
11 
12 
13 
14 
15 
16 

17: 

18 
19 

20 



mu[k\ < mu[i 

end if 
end for 
end procedure 



Expanding a term U(a, xi) using (|26|) depends on the inequality: 



/ n n n 

a < ^/Xi -i==^> a< \ \ - <=^ a < {/ - «=>• i < — . 
y V % V l a 

Therefore U(a, xi) always expand to a if a < \/~j, so then U(a) = la, and this 
is the first case of ([20| . Otherwise, if a > v/j j we split the summation (1271) : 

^ U(a,Xi)= ^ U(a,Xi)+ ^ U(a,Xi) 

l<i<I 1<*<"% ^T <i<7 

n v— * ( ~ ,fn 1 m 

Now, we apply the following approximation formulas for sums: 

fc=i d 



^fc- 1 /2« 2a ; 1 / 2 , 



fe=i 



and as a result we get the second case of ([201) : 

"«-$+*" t' m -&")- l r>< Ill2 -&' 2 

i „ „l/2 7-1/2 o 

3r a 3 

D S(n) values for powers of 10 



e 


5(10 e ) 





1 


1 


7 


2 


61 


3 


608 


4 


6083 


5 


60794 


6 


607926 


7 


6079291 


8 


60792694 


9 


607927124 


10 


6079270942 


11 


60792710280 


12 


607927102274 


13 


6079271018294 


14 


60792710185947 


15 


607927101854103 


16 


6079271018540405 


17 


60792710185403794 


18 


607927101854022750 


19 


6079271018540280875 


20 


60792710185402613302 


21 


607927101854026645617 


22 


6079271018540266153468 


23 


60792710185402662868753 


24 


607927101854026628773299 


25 


6079271018540266286424910 


26 


60792710185402662866945299 


27 


607927101854026628664226541 


28 


6079271018540266286631251028 


29 


60792710185402662866327383816 


30 


607927101854026628663278087296 


31 


6079271018540266286632795633943 


32 


60792710185402662866327694188957 


33 


607927101854026628663276901540346 


34 


6079271018540266286632767883637220 


35 


60792710185402662866327677953999263 


36 


607927101854026628663276779463775476 


6 


0.60792710185402662866327677925836583342615264 . . . 



Table 2. Values of S{10 e ) for < e < 36 



